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Abstract 



This paper addresses two problems lying at the intersection of geometric analysis and theoretical 
computer science: The non-linear isomorphic Dvoretzky theorem and the design of good approximate 
distance oracles for large distortion. We introduce the notion of Ramsey partitions of a finite metric 
space, and show that the existence of good Ramsey partitions implies a solution to the metric Ramsey 
problem for large distortion (a.k.a. the non-linear version of the isomorphic Dvoretzky theorem, as 
introduced by Bourgain, Figiel, and Milman in ||8J). We then proceed to construct optimal Ramsey 
partitions, and use them to show that for every e e (0, 1), every «-point metric space has a subset of size 
n^^'^ which embeds into Hilbert space with distortion 0(1 /s). This result is best possible and improves 
part of the metric Ramsey theorem of Bartal, Linial, Mendel and Naor |5|, in addition to considerably 
simplifying its proof. We use our new Ramsey partitions to design approximate distance oracles with 
a universal constant query time, closing a gap left open by Thorup and Zwick in \32\. Namely, we 
show that for every n point metric space X, and k > I, there exists an (9(A;)-approximate distance oracle 
whose storage requirement is O ^n'^'''*), and whose query time is a universal constant. We also discuss 
applications of Ramsey partitions to various other geometric data structure problems, such as the design 
of eflicient data structures for approximate ranking. 

1 Introduction 

Motivated by the search for a non-Unear version of Dvoretzky's theorem, Bourgain, Figiel and Milman fSI 
posed the following problem, which is known today as the metric Ramsey problem: Given a target distortion 
Of > 1 and an integer n, what is the largest k such that every w-point metric space has a subset of size k 
which embeds into Hilbert space with distortion a? (Recall that a metric space {X, dx) is said to embed 
into Hilbert space with distortion a if there exists a mapping f : X ^ L2 such that for every x,y e X, we 
have dxix,y) < \\f{x) - f(y)\\2 < adxix,y)). This problem has since been investigated by several authors, 
motivated in part by the discovery of its applications to online algorithms — we refer to t5J for a discussion 
of the history and applications of the metric Ramsey problem. 

The most recent work on the metric Ramsey problem is due to Bartal, Linial, Mendel and Naor fSl, who 
obtained various nearly optimal upper and lower bounds in several contexts. Among the results in |5| is the 
following theorem which deals with the case of large distortion: For every e e (0, 1), any w-point metric 
space has a subset of size which embeds into an ultrametric with distortion 0(i2I^Z£l) (recall that an 
ultrametric {X, dx) is a metric space satisfying for every x,y,z € X, dxix, y) < max {dx(x, z), dx(y, z)})- Since 
ultrametrics embed isometrically into Hilbert space, this is indeed a metric Ramsey theorem. Moreover, it 
was shown in |5| that this result is optimal up to the log(2/e) factor, i.e. there exists arbitrarily large ?i-point 
metric spaces, every subset of which of size n^~^ incurs distortion 0(1 /e) in any embedding into Hilbert 
space. The main result of this paper closes this gap: 

Theorem 1.1. Let {X, dx) be an n-point metric space and s € (0, 1). Then there exists a subset Y Q X with 
\Y\ > n"^"^ such that {Y, dx) is equivalent to an ultrametric with distortion at most 
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In the four years that elapsed since our work on |5]| there has been remarkable development in the struc- 
ture theory of finite metric spaces. In particular, the theory of random partitions of metric spaces has been 
considerably refined, and was shown to have numerous applications in mathematics and computer science 
(see for example fn\ |25l 123 [Q and the references therein). The starting point of the present paper was 
our attempt to revisit the metric Ramsey problem using random partitions. It turns out that this approach 
can indeed be used to resolve the metric Ramsey problem for large distortion, though it requires the in- 
troduction of a new kind of random partition, an improved "padding inequality" for known partitions, and 
a novel application of the random partition method in the setting of Ramsey problems. In Section |2l we 
introduce the notion of Ramsey partitions, and show how they can be used to address the metric Ramsey 
problem. We then proceed in Section|3]to construct optimal Ramsey partitions, yielding Theorem ll.il Our 
construction is inspired in part by Bartal's probabilistic embedding into trees |4|, and is based on a random 
partition due to Calinescu, Karloff and Rabani |9|, with an improved analysis which strengthens the work 
of Fakcharoenphol, Rao and Talwar 1 17 1. In particular, our proof of Theorem 1 1.1 l is self contained, and con- 
siderably simpler than the proof of the result from |5| quoted above. Nevertheless, the construction of fS) 
is deterministic, while our proof of Theorem II. II is probabilistic. Moreover, we do not see a simple way 
to use our new approach to simplify the proof of another main result of |5 1, namely the phase transition at 
distortion a = 2 (we refer to O for details, as this result will not be used here). The results of |5| which 
were used crucially in our work lETI on the metric version of Milman's Quotient of Subspace theorem are 
also not covered by the present paper. 

Algorithmic applications to the construction of proximity data structures. The main algorithmic ap- 
plication of the metric Ramsey theorem in 1 5 1 is to obtain the best known lower bounds on the competitive 
ratio of the randomized ^-server problem. We refer to [5J and the references therein for more information 
on this topic, as Theorem 1 1.1 1 does not yield improved ^-server lower bounds. However, Ramsey partitions 
are useful to obtain positive results, and not only algorithmic lower bounds, which we now describe. 

A finite metric space can be thought of as given by its « x « distance matrix. However, in many algo- 
rithmic contexts it is worthwhile to preprocess this data so that we store significantly less than rP' numbers, 
and still be able to quickly find out approximately the distance between two query points. In other words, 
quoting Thorup and Zwick 1321 . ''In most applications we are not really interested in all distances, we just 
want the ability to retrieve them quickly, if needed'. The need for such "compact" representation of metrics 
also occurs naturally in mathematics; for example the methods developed in theroetical computer science 
(specifically limi20l ) are a key tool in the recent work of Fefferman and Klartag ITSi on the extension of 
C" functions defined on n points in to all of R^. 

An influential compact representation of metrics used in theoretical computer science is the approximate 
distance oracle L3. .14. 32. 201. Stated formally, a (P, 5, Q, D)-approximate distance oracle on a finite metric 
space {X, dx) is a data structure that takes expected time P to preprocess from the given distance matrix, 
takes space S to store, and given two query points x,y eX, computes in time Q a number E{x, y) satisfying 
dx{x,y) < E{x,y) < D-dx{x,y). Thus the distance matrix itself is a (P = 0{l),S = 0{n^),Q = 0(1), D ^ 1)- 
approximate distance oracle, but clearly the interest is in compact data structures in the sense that S = o{n^). 
In what follows we will depart from the above somewhat cumbersome terminology, and simply discuss 
D-approximate distance oracles (emphasizing the distortion D), and state in words the values of the other 
relevant parameters (namely the preprocessing time, storage space and query time). 

An important paper of Thorup and Zwick |32 1 constructs the best known approximate distance oracles. 
Namely, they show that for every integer k, every ?i-point metric space has a {2k - l)-approximate distance 
oracle which can be preprocessed in time o[n^^, requires storage o{k ■ w^^'^*^), and has query time 0{k). 
Moreover, it is shown in 1321 that this distortion/storage tradeoff is almost tight: A widely believed combi- 
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natorial conjecture of Erdos (T6i is shown in 1^ (see also 1261 ') to imply that any data structure supporting 
approximate distance queries with distortion at most 2k - I must be of size at least Q ^n^''"^''*^^ bits. Since 
for large values of k the query time of the Thorup-Zwick oracle is large, the problem remained whether 
there exist good approximate distance oracles whose query time is a constant independent of the distortion 
(i.e., in a sense, true "oracles"). Here we use Ramsey partitions to answer this question positively: For any 
distortion, every metric space admits an approximate distance oracle with storage space almost as good as 
the Thorup-Zwick oracle (in fact, for distortions larger than Q.{\ogn/ log log n) our storage space is slightly 
better), but whose query time is a universal constant. Stated formally, we prove the following theorem: 

Theorem 1.2. For any k> \, every n-point metric space (X, dx) admits a 0{k)-approximate distance oracle 
whose preprocessing time is O {n^'^^l'^ log n^, requiring storage space O [n^'^^^^^, and whose query time is a 
universal constant. 

Another application of Ramsey partitions is to the construction of data structures for approximate rank- 
ing. This problem is motivated in part by web search and the analysis of social networks, in addition to being 
a natural extension of the ubiquitous approximate nearest neighbor search problem (see (2] fT7 l and the 
references therein). In the approximate nearest neighbor search problem we are given c > 1, a metric space 
(X, dx), and a subset F c X. The goal is to preprocess the data points Y so that given a query point x eX\Y 
we quickly return a point y e Y which is a c-approximate nearest neighbor of x, i.e. dx{x,y) < cdx{x, Y). 
More generally, one might want to find the second closest point to x in Y, and so forth (this problem has been 
studied extensively in computational geometry, see for example |2|). In other words, by ordering the points 
in X in increasing distance from x e X we induce a proximity ranking of the points of X. Each point of X 
induces a different ranking of this type, and computing it efficiently is a natural generalization of the nearest 
neighbor problem. Using our new Ramsey partitions we design the following data structure for solving this 
problem approximately: 

Theorem 1.3. Fix k> \, and an n-point metric space (X, dx). Then there exist a data structure which can be 
preprocessed in time O {kn^*^ 1^ \og ri^, uses only 0{kn^^^l^^ storage space, and supports the following type 
of queries: Given x e X, have "fast access" to a permutation of n^'^'' ofX satisfying for every I < i < j < n, 
dx [x, ;rW(/)) < 0{k) ■ dx [x, 7:^(7)). By 'fast access" to n^'^^ we mean that we can do the following: 

1. Given a point x € X, and i € {I, . . . ,n}, find 7T^^\i) in constant time. 

2. For any x,u e X, compute j e {I,. . .,n] such that n^''\i) = u in constant time. 

As is clear from the above discussion, the present paper is a combination of results in pure mathematics, 
as well as the theory of data structures. This exemplifies the close interplay between geometry and computer 
science, which has become a major driving force in modern research in these areas. Thus, this paper "caters" 
to two different communities, and we put effort into making it accessible to both. 

2 Ramsey partitions and their equivalence to the metric Ramsey problem 

Let (X, dx) be a metric space. In what follows for x € X and r > we let Bx(x, r) = [y € X : dx{x, y) < r) be 
the closed ball of radius r centered at x. Given a partition ^ of X and x € X we denote by ^(x) the unique 
element of ^ containing x. For A > we say that is A-bounded if for every C e diam(C) < A. A 
partition tree of X is a sequence of partitions | ^^k]'^^^ of X such that ^0 = (X), for all /c > the partition 
is 8"*^ diam(X)-bounded, and ^k+\ is a refinement of iS^k (the choice of 8 as the base of the exponent in this 
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definition is convenient, but does not play a crucial role here). For jS, y > we shall say that a probability 
distribution Pr over partition trees {=^/t)^Q of X is completely yS-padded with exponent y if for every x e X, 

Pr [V A: e N, Bx {x,/3 ■ S''' diam(X)) c ^i,{x)\ > \X\-^ . 

We shall call such probability distributions over partition trees Ramsey partitions. 

The following lemma shows that the existence of good Ramsey partitions implies a solution to the metric 
Ramsey problem. In fact, it is possible to prove the converse direction, i.e. that the metric Ramsey theorem 
implies the existence of good Ramsey partitions (with appropriate dependence on the various parameters). 
We defer the proof of this implication to Appendix IbI as it will not be used in this paper due to the fact that 
in Section|21we will construct directly optimal Ramsey partitions. 

Lemma 2.1. Let {X,dx) be an n-point metric space which admits a distribution over partition trees which 
is completely ^-padded with exponent y. Then there exists a subset Y Q X with \Y\ > n^~^ which is 8/j3 
equivalent^ to an ultrametric. 

Proof. We may assume without loss of generality that diam(X) = 1. Let {^/tl^Q be a distribution over 
partition trees of X which is completely y6-padded with exponent y. We define an ultrametric p on X as 
follows. For x,y € X let ^ be the largest integer for which ^k{x) - ^k(y), and set p{x,y) - 8"^^. It is 
straightforward to check that p is indeed an ultrametric. Consider the random subset Y c X given by 

Y = [xeX: VkeN, Bx {x,/3 ■ 8"^^) c ^^(x)) . 

Then 

E\Y\ = Y^Ft[v k e n, Bx (x,/3 • 8"*^ diam(X)) c ^,,(x)] > n^'^ . 

X€X 

We can therefore choose Y c X with \Y\ > n^~^ such that for all x e 7 and all A: > we have Bx i^x,^ ■ 8~*) c 
^yt(x). Fix x,y € X, and let k be the largest integer for which ^^ti^) = ^k(y)- Then dxix,y) < 
diam(^,t(x)) < S"*" = p{x,y). On the other hand, if x € X and y € Y then, since ^k+i(x) 1^ ^k+i(y), 
the choice of Y imphes that x ^ Bx{y,/3 ■ 8"^"^). Thus dx{x,y) > /3 ■ = ^p{x,y). It follows that the 

metrics dx and p are equivalent on Y with distortion 8//3. □ 



3 Constructing optimal Ramsey partitions 

The following lemma gives improved bounds on the "padding probability" of a distribution over partitions 
which was discovered by Calinescu, Karloff and Rabani in I^J. 

Lemma 3.1. Let (X, dx) be a finite metric space. Then for every A > there exists a probability distribution 
Pr over A-bounded partitions ofX such that for every < t < A/8 and every x e X, 

\ \Bx{x,A)\ 1 



' Here, and in what follows, for D > 1 we say that two metric spaces {X, dx) and (K, rfy) are £)-equivalent if there exists a bijection 
/ : X — > y and a scaling factor C > such that for all x,y e X we have Cdx(x, y) < dyifix), fiy)) < CDdx{x, y). 
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Remark 3.1. The distribution over partitions used in the proof of Lemma ITTI is precisely the distribution 
introduced by Calinescu, KarlofF and Rabani in ||9l. In (Tt\ Fakcharoenphol, Rao and Talwar proved the 
following estimate for the same distribution 

Pr[B.fe,)c^WJ.l-o(ilog|g^)^ (2) 

Clearly the bound Q is stronger than and in particular it yields a non-trivial estimate even for large 
values of t for which the lower bound in Q is negative. This improvement is crucial for our proof of 
Theorem ll.il The use of the "local ratio of balls" (or "local growth") in the estimate ^ of Fakcharoenphol, 
Rao and Talwar was a fundamental breakthrough, which, apart from their striking application in [17] , has 
since found several applications in mathematics and computer science (see l25li24i mi 

Proof of Lemma \T7\ Write X = {xi, . . . ,Xn}. Let R be chosen uniformly at random from the interval 
[A/4, A/2], and let ;r be a permutation of [I, . . .,n} chosen uniformly at random from all such permuta- 
tions (here, and in what follows, R and n are independent). Define Ci := Bx{xjr{i),R) and inductively for 
2<j<n, 

7-1 

Cj- Bx{x,uhR)\[jQ. 

Finally we let J!^ := {C\, . . . , C„} \ {0). Clearly is a (random) A-bounded partition on X. 
For every r € [A/4, A/2], 

Pr [Bx (X, t) c ^ix)\R = r]> l^^^""' ~ . (3) 

\Bx{x, r + t)\ 

Indeed, if R = r, then the triangle inequality implies that if in the random order induced by the partition n 
on the points of the ball Bx{x, r + t) the minimal element is from the ball Bx{x, r - t), then Bx {x, t) c ^(x) 
(see Figure n for a schematic description of this situation). This event happens with probability {f^^jT^, 
implying (|5Jl. 




Figure 1 : A schematic description of the lower bound in (|3j- The clusters that are induced by points which lie outside 
the ball Bx(x, r + t), such as c, cannot touch the ball Bxix, t). On the other hand, if a point from Bx(x, r - t), such as 
a, appeared first in the random order among the points in Bx{x, r + t) then its cluster will "swallow" the ball Bxix, t). 
The probability for this to happen is . Only points in the shaded region can split the ball Bxix, t). 
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Write J- = k + /3, where /? e [0, 1) and ^ is a positive integer. Then 



> 



> 



> 



> 
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Jzi \Bx(x,^ + 2jt + s-t)\ 
j=o \Bx(x,^ + 2jt + s + t)\ 

\Bx{x,^ + s - t)\ 
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ds + —\ 2kt 
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\Bx(x, 









\Bx{x,^ + 2kt-t)\ 



\Bx(x,^+t)\ 



(V) 



where in © we used Q, in ^ we used the arithmetic mean/geometric mean inequality, in ^ we used the 
elementary inequaUty 6a + (l - G)b > a^b^~^, which holds for all € [0, 1] and a,b > 0, and in Q we used 
the fact that ^ - ^ - 1 is negative. □ 

The following theorem, in conjunction with Lemma ITTl implies Theorem ll.il 

Theorem 3.2. For every a > \, every finite metric space {X, dx) admits a completely 1 1 a padded random 
partition tree with exponent 16/ a. 

Proof. Fix a > I. Without loss of generality we may assume that diam(Z) = 1. We construct a partition 
tree l<^«:)^o ^ follows. Set Sq = {X}. Having defined S't we let J^k+\ be a partition as in Lemma ITTI 
with A = 8"*^ and t = A/a (the random partition ^t+i is chosen independently of the random partitions 
^1, . . . , J!^k)- Define S'k+i to be the common refinement of £'k and ^k+\, i-C- 



4+1 := {C n C : C e 4, C € 
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The construction implies that for every x e X and every ^ > we have S'k+\{x) = S'k{x) n ,'3^k+\{x)- Thus 
one proves inductively that 

V A: € N, Bxfx, —1 c ^^(x) ^ V it e N, Bx{x, —1 c 



From Lemma inl and the independence of {=^/t}^j it follows that 













Pr 


y ken, Bx{x,^—^<zSk{x) 


> Pr 


\/ke'N, Bx |x, c 3 













CO 

^ n 



/t=i 



Bx\x, — \Q.^k{x) 
\Bx{x,%'''~')\ 



\Bx{x, 8-«^)| 



- 1/8)1"^ > ixr^. 



4 Applications to proximity data structures 

In this section we show how Theorem l3.2l can be applied to the design of various proximity data structures, 
which are listed below. Before doing so we shall recall some standard facts about tree representations 
of ultrametrics, all of which can be found in the discussion in |5|. Any finite ultrametric {X,p) can be 
represented by a rooted tree T = {V,E) with labels A : V (0, oo), whose leaves are X, and such that if 
u,v e V and v is a child of u then A(v) < A(m). Given x,y e X we then have p{x,y) = A (lca{x,y)), where 
lca(x,3') is the least common ancestor of x and y in T. For ^ > 1 the labelled tree described above is called 
a /c-HST (hierarchically well separated tree) if its labels satisfy the stronger decay condition A(v) < 
whenever v is a child of u. The tree T is called an exact ^-HST if we actually have an equality A(v) = 
whenever v is a child of u. Lemma 3.5 in |5| implies that any n-point ultrametric is /^-equivalent to a metric 
on ^-HST which can be computed in time 0(n). 

We start by proving several structural lemmas which will play a crucial role in the design of our new 
data structures. 

Lemma 4.1 (Extending ultrametrics). Let {X, dx) be a finite metric space, and a>\. Fix % i^Y QX, and 

assume that there exits an ultrametric p on Y such that for every x,y & Y, dx{x, y) < p{x, y) < adx(x, y). 
Then there exists an ultrametric p defined on all ofX such that for every x,y e X we have dx{x,y) < p{x,y), 
and ifxeX and y e Y then p(x, y) < 6adx{x, y). 

Proof. Let T = {V,E) be the 1-HST representation of p, with labels A : V ^ (0, oo). In other words, the 
leaves of T are Y, and for every x,y e Y we have A(lca(x,3')) = p{x,y). It will be convenient to augment T 
by adding an incoming edge to the root with A(parent(root)) = oo. This clearly does not change the induced 
metric on Y. For every x e X \ Y lety e F be its closest point in Y, i.e. dxix,y) = dxix, Y). Let u be the 
least ancestor of y for which A(m) > dxix, y) (such a u must exist because we added the incoming edge to the 
root). Let v be the child of u along the path connecting u and y. We add a vertex w on the edge {u, v) whose 
label is dxix,y), and connect jc to T as a child of w. The resulting tree is clearly still a 1-HST. Repeating this 
procedure for every x e X\Y we obtain a 1-HST T whose leaves are X. Denote the labels on T by A. 



7 



Fix x,y e X, and let x',y' e Y the nearest neighbors of x,y (respectively) used in the above construction. 
Then 



A(lcaf{x,yfj = max {a (ica^^Cx, x')) , A(lcaj;()',y)) , A(lcaj;(x',y))| 

> max [dx{x, x), dx(y, /), dx{x,y')] 
dxix, x') + dxiy,y') + dx{x',y') 



> 



> I dxix, y). (8) 

In the reverse direction, if x € X and 3^ € 7 let x' € 7 be the closest point in F to x used in the construction 
of T. Then dxix',y) < dxix', x) + dxix,y) < 2dxix,y). If Xcafiy, x') is an ancestor of lcaj;(x, x') then 

A (lcaj;(x, 3^)) = A(lcaj;(x',j)) = pix ,y) < a ■ dxix' ,y) < 2a ■ dxix,y). (9) 
If, on the other hand, Ica^^Cy, x') is a descendant of lcaj;(x, x') then 

A(lcay(x,y)) = A (lcaj;(x, x')) = dxix,x') < dxix,y). (10) 

Scahng the labels of T by a factor of 3, the required result is a combination of ©, ® and (fTUt . □ 

The following lemma is a structural result on the existence of a certain distribution over decreasing 
chains of subsets of a finite metric space. In what follows we shall call such a distribution a stochastic 
Ramsey chain. A schematic description of this notion, and the way it is used in the ensuing arguments, is 
presented in Figure EJbelow. 

Lemma 4.2 (Stochastic Ramsey chains). Let iX,dx) be an n-point metric space and k > I. Then there 
exists a distribution over decreasing sequences of subsets X = Xq ^ X\ ^ X2 ■ • • ^ Xs = d (s itself is a 
random variable), such that for all p > -\/k, 



s-l 



7=0 



<|max<i^-^,lH-«''^'^', (11) 



and such that for each j € {1, . . . , s) there exists an ultrametric pj on X satisfying for every x,y e X, 
Pjix,y) > dxix,y), and ifxeX andy € Xj-i \ Xj then pjix,y) < Oik) ■ dxix,y). 

Remark 4.1. In what follows we will only use the cases p e {0, 1, 2} in Lemma l4!2l Observe that for p = 0, 
(im is simply the estimate Es < kn^^'^. 

Proof of Lemma W^ By Theorem 13. 2 1 and the proof of Lemma ITTI there is a distribution over subsets Yi c 
Xq such that E\Y\\ > n^'^l^ and there exists an ultrametric p\ on Y\ such that every x,y € Yy satisfy 
dxix,y) < p\ix,y) < Oik) ■ dxix,y). By Lemma HTTI we may assume that p\ is defined on all of X, for every 
x,y € X we havepi(x,3') > dxix,y), and if x e Xandy e Yi thenpi(x,3') < Oik)-dxix,y). Define Xi = Xo\Y[ 

and apply the same reasoning to Xi , obtaining a random subset ^2 Q Xo\Y\ and an ultrametric p2- Continuing 
in this manner until we arrive at the empty set, we see that there are disjoint subsets, Y\, . . . ,Ys Q X, and 
for each j an ultrametric pj on X, such that for x,y € X we have pjix,y) > dxix,y), and for x e X, 
y e Yj we have py(x,3') < Oik) ■ dxix,y). Additionally, writing Xj := X \ [jj^^ Yi, we have the estimate 

\Yj\ Yi,...,YjJ>\Xj^,\'-''K 
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The proof of (fTTl is by induction on n. For n 
hypothesis 



1 the claim is obvious, and if n > 1 then by the inductive 



s-l 



7=0 



< + \ max 



n' + max 



I + pk 
k 

I + pk 



,1 -l^il 



p+i/k 



p+i/k 



< nP + \ max 



T^.'})-«'-(-h{-p'})-^ 



max 



1+ pk 



+ n' - n 



p-l + l/yt 



\Yi 



II- 



Taking expectation with respect to Fi gives the required result. 



□ 



Observation 4.3. If one does not mind losing a factor of 0(log n) in the construction time and storage of 
the Ramsey chain, then an alternative to Lemma HT^ is to randomly and independently sample O {n^^'^ log nj 
ultrametrics from the Ramsey partitions. 

Before passing to the description of our new data structures, we need to say a few words about the 
algorithmic implementation of Lemma l42l (this will be the central preprocessing step in our constructions). 
The computational model in which we will be working is the RAM model, which is standard in the context 
of our type of data-structure problems (see for example |32|). In fact, we can settle for weaker computational 
models such as the "Unit cost floating-point word RAM model" — a detailed discussion of these issues can 
be found in Section 2.2. of |20|. 

The natural implementation of the Calinescu-KarlofF-Rabani (CKR) random partition used in the proof 
of Lemma im takes O {rP'^ time. Denote by O = <1>(X) the aspect ratio of X, i.e. the diameter of X divided 
by the minimal positive distance in X. The construction of the distribution over partition trees in the proof of 
Theorem 13 . 2l requires performing C?(log <!)) such decompositions. This results in O (n^ log preprocessing 
time to sample one partition tree from the distribution. Using a standard technique (described for example 
in EUl Sections 3.2-3.3]), we dispense with the dependence on the aspect ratio and obtain that the expected 
preprocessing time of one partition tree is O {n^ log n^. Since the argument in ll20l is presented in a slightly 
different context, we shall briefly sketch it here. 

We start by constructing an ultrametric p on X, represented by an HST H, such that for every x,y e X, 
dx{^,y) ^ p{^,y) ^ ndx{x,y). The fact that such a tree exists is contained in ||5| Lemma 3.6], and it can be 
constructed in time O {rP'^ using the Minimum Spanning Tree algorithm. This implementation is done in l20l 
Section 3.2]. We then apply the CKR random partition with diameter A as follows: Instead of applying it to 
the points in X, we apply it to the vertices uofH for which 



A(m) < — < a (parent(M)) 



(12) 



Each such vertex u represents all the subtree rooted at u (in particular, we can choose arbitrary leaf descen- 
dants to calculate distances — these distances are calculated using the metric dx), and they are all assigned 
to the same cluster as u in the resulting partition. This is essentially an appUcation of the algorithm to an 
appropriate quotient of X (see the discussion in ^TT\). We actually apply a weighted version of the CKR 
decomposition in the spirit of 1 25 1, in which, in the choice of random permutation, each vertex u as above 
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is chosen with probability proportional to the number of leaves which are descendants of u (note that this 
change alters the guarantee of the partition only slightly: We will obtain clusters bounded by ^1 + l/n^^ A, 
and in the estimate on the padding probability the radii of the balls is changed by only a factor of (1 ± l/n)). 
We also do not process each scale, but rather work in "event driven mode": Vertices of H are put in a non 
decreasing order according to their labels in a queue. Each time we pop a new vertex u, and partition the 
spaces at all the scales in the range [A{u),n^A{u)], for which we have not done so already. In doing so 
we effectively skip "irrelevant" scales. To estimate the running time of this procedure note that the CKR 
decomposition at scale 8^ takes time O {mf^ where m, is the number of vertices u of H satisfying (fT2l with 
A = 8'. Note also that each vertex of H participates in at most C?(log?i) such CKR decompositions, so 
= 0(nlogn). Hence the running time of the sampling procedure in Lemma IT2l is up to a constant 
factor mj = O [rp- log n). 

The Ramsey chain in Lemma l42l will be used in two different ways in the ensuing constructions. For our 
approximate distance oracle data structure we will just need that the ultrametric pj is defined on Xy_i (and 
not all of X). Thus, by the above argument, and Lemma l42l the expected preprocessing time in this case 
is 0(EXy:} log |Xy|) - log ?i) and the expected storage space is 0{eY,]Z\ |Xy|) - o(n^^^l''). 

For the purpose of our approximate ranking data structure we will really need the metrics pj to be defined 
on all of X. Thus in this case the expected preprocessing time will be O {n^ log n ■ Es^ = O {krp-'^^^^ log n^, 
and the expected storage space is C? (« • E^) = O {kn^^^l'^^. 

1) Approximate distance oracles. Our improved approximate distance oracle is contained in Theorem ll.2l 

which we now prove. 

Proof of Theorem lTTl We shall use the notation in the statement of Lemma Let Tj = (Vj,Ej) and 
Ay : Vj (0, oo) be the HST representation of the ultrametric py (which was actually constructed explicitly 
in the proofs of Lemma ITTI and Lemma IT^ . The usefulness of the tree representation stems from the fact 
that it very easy to handle algorithmically. In particular there exists a simple scheme that takes a tree and 
preprocesses it in linear time so that it is possible to compute the least common ancestor of two given nodes 
in constant time (see |21 6|). Hence, we can preprocess any 1-HST so that the distance between every two 
points can be computed in 0(1) time. 

For every point x e X let ix be the largest index for which x € X,-^_i. Thus, in particular, x € Yj^. We 
further maintain for every x e X a. vector (in the sense of data-structures) vec^ of length (with 0(1) time 
direct access), such that for / € {0, . . . , ix - 1), veCjc[/] is a pointer to the leaf representing x in Tj. Now, given 
a query x,y e X assume without loss of generality that ix < iy It follows that x,y e X,-^-i. We locate the 
leaves x = \tCxUx], and y - veCy[/jc] in T,-^, and then compute A(lca {x,y)) to obtain an 0{k) approximation 
to dx(x,y). Observe that the above data structure only requires py to be defined on Xy_i (and satisfying the 
conclusion of Lemma l4.2l for x,y € Xy_i). The expected preprocessing time is o(n^'*'^^^logrij. The size of 
the above data structure is O (Zy=o l^y'l)' which is in expectation O (n^^^^'^y □ 

Remark 4.2. Using the distributed labeling for the least common ancestor operation on trees of Peleg fSOl, 
the procedure described in the proof of Theorem 11.21 can be easily converted to a distance labeling data 
structure (we refer to ll^ Section 3.5] for a description of this problem). We shall not pursue this direction 
here, since while the resulting data structure is non-trivial, it does not seem to improve over the known 
distance labeling schema ll^ . 
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Figure 2: A schematic description of Ramsey chains and the way they are used to construct approximate distance 
oracles and approximate ranking data structures. Ramsey chains are obtained by iteratively applying Theorem 13. 21 
and Lemma lzTl to find a decreasing chain of subsets X - Xq ^ X\ ^ X2 ■ ■ ■ ^ X, - (/> such that Xj can be approximated 
by a tree metric Tj+ 1 . The tree T j+ 1 is, in a sense, a "distance estimator" for Xj \ Xj+ 1 — it can be used to approximately 
evaluate the distance from a point in Xj \ Xj+i to any other point in Xj. These trees form an array which is an 
approximate distance oracle. In the case of approximate ranking we also need to extend the tree Tj+i to a tree on the 
entire space X using Lemma RTTI The nodes that were added to these trees are illustrated by empty circles, and the 
dotted lines are their connections to the original tree. 

2) Approximate ranking. Before proceeding to our of-approximate ranking data structure (Theorem II. 3 1 
we recall the setting of the problem. Thinking of X as a metric on { 1, . . . , «}, and fixing a > I, the goal here 
is to associate with every x e X a. permutation tt^^^ of | 1, . . . , «) such that dx{x, 7r^'''\i)) < a ■ dx{x, n^^\j)) for 
every I < i < j < n. This relaxation of the exact proximity ranking induced by the metric dx allows us to 
gain storage efficiency, while enabling fast access to this data. By fast access we mean that we can preform 
the following tasks: 

1. Given an element x e X, and / € 1 1, . . . , «), find tt^^^/) in 0(1) time. 

2. Given an element x e X and y e X, find number i e {I,. . .,n], such that K'^^\i) = y, in 0(1) time. 
We also require the following lemma. 

Lemma 4.4. Let T = {V,E) be a rooted tree with n leaves. For v € V, let ^r(v) be the set of leaves in 
the subtree rooted at v, and denote £t{v) = |^r(v)|. Then there exists a data structure, that we call Size- 
Ancestor, which can be constructed in time 0{n), so as to answer in time 0(1) the following query: Given 
^ € N and a leaf x e V, find an ancestor u of x such that ijiu) < £ < /'(parent(M)). Here we use the 
convention ^(parent(root)) = 00. 

To the best of our knowledge, the data structure described in Lemma HT^l has not been previously studied. 
We therefore include a proof of Lemma HT^ in Appendix IaI and proceed at this point to conclude the proof 
of Theorem 1 1.3 1 

Proof of Theorem ITTH We shall use the notation in the statement of Lemma Let Tj = {Vj,Ej) and 
Aj : Vj — > (0, 00) be the HST representation of the ultrametric pj. We may assume without loss of generality 
that each of these trees is binary and does not contain a vertex which has only one child. Before presenting 
the actual implementation of the data structure, let us explicitly describe the permutation tt^'^^ that the data 
structure will use. For every internal vertex v € Vj assign arbitrarily the value to one of its children, and the 
value 1 to the other. This induces a unique (lexicographical) order on the leaves of Tj. Next, fix ;c € X and 
ix such that x e Yj^. The permutation n'-^^ is defined as follows. Starting from the leaf x in T,-^, we scan the 
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path from x to the root of T,-^. On the way, when we reach a vertex u from its child v, let w denote the sibling 
of V, i.e. the other child of u. We next output all the leafs which are descendants of w according to the total 
order described above. Continuing in this manner until we reach the root of r,^ we obtain a permutation n^^^ 
of X. 

We claim that the permutation n^'^^ constructed above is an 0(^)-approximation to the proximity ranking 
induced by x. Indeed, fix j, z € X such that Ck ■ dxix,y) < dxix,zX where C is a large enough absolute 
constant. We claim that z will appear after y in the order induced by n'^^\ This is true since the distances 
from X are preserved up to a factor of 0{k) in the ultrametric r,^ . Thus for large enough C we are guaranteed 
that (ij-. (x, < (ij-. (x, z), and therefore Icaj-, (x, z) is a proper ancestor of Icaj-; (x, j). Hence in the order 
just describe above, y will be scanned before z. 

We now turn to the description of the actual data structure, which is an enhancement of the data structure 
constructed in the proof of Theorem 1 1.21 As in the proof of Theorem 1 1.21 our data structure will consist of 
a "vector of the trees Tf, where we maintain for each x e X a pointer to the leaf representing x in each Tj. 
The remaining description of our data structure will deal with each tree Tj separately. First of all, with each 
vertex v e Tj we also store the number of leaves which are the descendants of v, i.e. |^7-.(v)| (note that all 
these numbers can be computed in 0{n) time using, say, depth-first search). With each leaf of Tj we also 
store its index in the order described above. There is a reverse indexing by a vector for each tree Tj that 
allows, given an index, to find the corresponding leaf of Tj in 0(1) time. Each internal vertex contains a 
pointer to its leftmost (smallest) and rightmost (largest) descendant leaves. This data structure can be clearly 
constructed in 0{n) time using, e.g., depth-first transversal of the tree. We now give details on how to answer 
the required queries using the "ammunition" we have listed above. 

1. Using Lemma l4!4l find an ancestor v of x such that {xjiv) < i < ^r^(parent(v)) in 0(1) time. Let 
u = parent(v) (note that v can not be the root). Let w be the sibling of v (i.e. the other child of u). 
Next we pick the leaf numbered - frjiv)^ + left(H') - 1, where left(H') is the index to the leftmost 
descendant of w. 

2. Find u = lca(x,3') (in 0(1) time, using EllH). Let v and w be the children of u, which are ancestors 
of X and y, respectively. Return £t (v) + ind(y) - left(H'), where ind(j) is the index of y in the total 
order of the leaves of the tree. 

This concludes the construction of our approximate ranking data structure. Because we need to have 
the ultrametric py defined on all of X, the preprocessing time is o(^«^^'^*log?i) and the storage size is 

0^/:«^^'^*), as required. □ 

Remark 4.3. Our approximate ranking data structure can also be used in a nearest neighbor heuristic called 
"Orchard Algorithm" |28| (see also |13 Sec. 3.2]). In this algorithm the vanilla heuristic can be used to 
obtain the exact proximity ranking, and requires storage Q {n^y Using approximate ranking the storage 
requirement can be significantly improved, though the query performance is somewhat weaker due to the 
inaccuracy of the ranking lists. 

3) Computing the Lipschitz constant. Here we describe a data structure for computing the Lipschitz 
constant of a function f : X ^ Y, where (7, dy) is an arbitrary metric space. When (X, dx) is a doubling 
metric space (see |22|), this problem was studied in |20|. In what follows we shall always assume that / 
is given in oracle form, i.e. it is encoded in such a way that we can compute its value on a given point in 
constant time. 
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Lemma 4.5. There is an algorithm that, given an n-point ultrametric {U, du) defined by the HST T = (V, E) 
(in particular U is the set of leaves ofT), an arbitrary metric space (F, Jy), and a mapping f : U ^ Y, 
returns in time 0{n) a number A > satisfying \\f\\up > A > ■ \\f\\up- 

Proof. We assume that T is 4-HST. As remarked in the beginning of Section |3] this can be achieved by 
distorting the distances in ?7 by a factor of at most 4, in 0{n) time. We also assume that the tree T stores for 
every vertex v e V an arbitrary leaf Xy € U which is a descendant of v (this can be easily computed in 0{n) 
time). For a vertex u e V we denote by A{u) its label (i.e. W x,y e U, du{x,y) - A(lca(x, j))). 
The algorithm is as follows: 



Lip-UM(r,/) 

A ^ 

For every vertex m € T do 
Let vi , . . . be the children of u. 



Output A. 



max < A, max2<i<,. ■ 



A(«) 



Clearly the algorithm runs in linear time (the total number of vertices in the tree is 0{n) and each vertex 
is visited at most twice). Furthermore, by construction the algorithm outputs A < ||/||Lip- It remains to prove 
a lower bound on A. Let xi,X2 e U he, such that ll/Hup = and denote u - lca{x,y). Let wi, W2 

be the children of u such that xi € ^riwi), and X2 € ^7(w2). Let vi be the "first child" of u as ordered by 
the algorithm Lip-UM (notice that this vertex has special role). Then 



A > max 



[ (jy(/(Xv,,),/(x,,,)) <iy(/(Xv,,),/(Xv,)) 

1 A{u) ' A(m) 

^ 1 _ dY{f{x„^),f{x„J) 
~ 2 ' A(m) 

^ 1 ^y(/(xi),/(x2)) - diam(/(^r(wi))) - diam(/(^r(w2))) 
~ 2 ' A(m) 

If max{diam(/(^r(wi))),diam(/(^7-(w2)))) < \dY{f{xi),f{x2)), then we conclude that 

^^1 dY{f{x,),f{X2)) 

~ 4 ' A(m) 

as needed. Otherwise, assuming that diam(/(if7'(wi))) > \ ■ t/y(/(xi),/(x2)), there exist z,z' € ^rCwi) 
such that 

dY{f{z),f{z')) ^ i-dYifixi),f{X2)) _ 

du{z,z') A{u)/4 -"^"^'P' 

which is a contradiction. □ 

Theorem 4.6. Given k > I, any n-point metric space {X,dx) can be preprocessed in time C? (n^'*''^* log n^, 

yielding a data structure requiring storage C?^?i'^'^*^ which can answer in C?^?i'^'^*^ time the following 
query: Given a metric space (F, dy) and a mapping f : X ^ Y, compute a value A > 0, such that ||/||Lip ^ 
A > ||/||up/0(^). 
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Proof. The preprocessing is simply computing the trees {TjY.^^ as in the proof of Theorem ll.2l Denote the 
resulting ultrametrics by {U\,p\), . . . , {Us,Ps)- Given f : X ^ Y, represent it as g, :[/,—> 7 (as a mapping 
gi is the same mapping as the restriction of / to Ui). Use Lemma |431 to compute an estimate A,- of HgiHup, 
and return A := max,- A,-. Since all the distances in Ui dominate the distances in X, ||/||Lip > llgfllup ^ so 
ll/llLip > A. On the other hand, let x,y eXhe such that H/Hup = By Lemma lO there exists 

/ € { 1, . . . , such that du,{x, y) < 0{k) ■ dx{x, y), and hence \\gi\\up > \\f\\uplO{k), And so A > • ||g,||Lip > 
ll/llLip/0(^), as required. Since we once more only need that the ultrametric pj is defined on Xj-\ and not 
on all of X, the preprocessing time and storage space are the same as in Theorem II .21 By Lemma the 
query time is \Xj^ = o{n^'''^^'^^ (we have a 0{\Xj\) time computation of the Lipschitz constant on 

each Xj). □ 



5 Concluding Remarks 

An s-well separated pair decomposition (WSPD) of an n-point metric space {X, dx) is a collection of pair 
of subsets [{Ai, B,)}^^ A,-, Bi c X, such that 

1. 'ix,y eXifxi^y then {x,y) € U^iC^/ x ^d- 

2. For all / j, {Ai x B,) n {Aj x Bj) ^ 0. 

3. For all / € 1 1, ... , M], dx{Ai, Bi) > s ■ max{diam(AO, diam(B,)}. 

The notion of 5-WSPD was first defined for Euclidean spaces in an influential paper of Callahan and 
Kosaraju [111, where it was shown that for n-point subsets of a fixed dimensional Euclidean space there 
exists such a collection of size 0{n) that can be constructed in 0{nlogn) time. Subsequently, this concept 
has been used in many geometric algorithms (e.g. 1 33 10 1), and is today considered to be a basic tool in com- 
putational geometry. Recently the definition and the efficient construction of WSPD were generalized to the 
more abstract setting of doubling metrics lOTIBOI . These papers have further demonstrated the usefulness 
of this tool (see also 1 18 1 for a mathematical application). 

It would be clearly desirable to have a notion similar to WSPD in general metrics. However, as formu- 
lated above, no non-trivial WSPD is possible in "high dimensional" spaces, since any 2-WSPD of an n-point 
equilateral space must be of size D.{n^). The present paper suggests that Ramsey partitions might be a par- 
tial replacement of this notion which works for arbitrary metric spaces. Indeed, among the applications of 
WSPD in fixed dimensional metrics are approximate ranking (though this application does not seem to have 
appeared in print — it was pointed out to us by Sariel Har-Peled), approximate distance oracles lIT^EUl . 
spanners 13111201 . and computation of the Lipschitz constant |20|. These applications have been obtained 
for general metrics using Ramsey partitions in the present paper (spanners were not discussed here since our 
approach does not seem to beat previously known constructions). We believe that this direction deserves 
further scrutiny, as there aie more applications of WSPD which might be transferable to general metrics 
using Ramsey partitions. With is in mind it is worthwhile to note here that our procedure for constructing 
stochastic Ramsey chains, as presented in Section 0] takes roughly n^^^l'^ time (up to logarithmic terms). 
For applications it would be desirable to improve this construction time to 0{n^). The construction time 
of ceratin proximity data structures is a well studied topic in the computer science literature — see for 
example El ESI- 
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Appendices 

A The Size- Ancestor data structure 

In this appendix we prove Lemma l4!4l Without loss of generality we assume that the tree T does not contain 
vertices with only one child. Indeed, such vertices will never be returned as an answer for a query, and thus 
can be eliminated in 0{n) time in a preprocessing step. 

Our data structure is composed in a modular way of two different data structures, the first of which is 
described in the following lemma, while the second is discussed in the proof of Lemma l4!4l that will follow. 



Lemma A.l. Fix m € N, and let T be as in Lemma [4.4\ Then there exists a data structure which can 
be preprocessed in time 0{n+ and answers in time 0(1) the following query: Given ^ e N and 

a leaf x e V, find an ancestor u of x such that iriu) < €m < /?(parent(M)). Here we use the convention 
^(parent(root)) - oo. 

Proof Denote by X the set of leaves of T. For every internal vertex v € V, order its children non-increasingly 
according to the number of leaves in the subtrees rooted at them. Such a choice of labels induces a unique 
total order on X (the lexicographic order). Denote this order by < and let /: 1 1, ...,«) ^ X be the unique 
increasing map in the total order For every v e V, f~^ {^t{v)) is an interval of integers. Moreover, the 
set of intervals |/~'(Jf7-(v)) : v e V] forms a laminar set, i.e. for every pair of intervals in this set either 
one is contained in the other, or they are disjoint. For every v eV write (^^(v)) = ly - [Ay, B^], where 
Ay, By € N and A,, < By. For / e {1, . . . , [n/m\} and j € {1, . . . , \n/{im)'\} let F;(j) be the set of vertices 
V e V such that |/v| > im, ly n {{j - \)im + l,jim] + 0, and there is no descendant of v satisfying these two 
conditions. Since at most two disjoint intervals of length at least im can intersect a given interval of length 
im, we see that for all /, f \Fi{f)\ < 2. 

Claim A.2. Let x e X be a leaf of T, and ^ € N. Let u e V be the least ancestor of x for which tjiu) > (m. 
Then 

'fix)' 



u e < lca{x, v) : v e Ft 



em 



Proof. \fu e F{ (["^j) then since u = lca(x, u) there is nothing to prove. If on the other hand u ^ F( (['^j) 

then since we are assuming that /'^(m) > £m, and /„ n [(['^] - l) + 1, ["^j ^wj] i= (because f{x) e /„), 

it follows that u has a descendant v in Ff (['^j)- Thus u = lca{x, v), by the fact that any ancestor w of v 
satisfies ^/-(w) > iriv) > €m, and the minimality of u. □ 

The preprocessing of the data structure begins with ordering the children of vertices non-increasingly 
according to the number of leaves in their subtrees. The following algorithm achieves it in linear time. 
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SORT-CHILDREN(m) 

Compute {^r(w))«ev using depth first searcii. 

Sort V non-increasingly according to (ji') (use bucket sort- see ITSl Ch. 9]). 

Let {vi)i be the set V sorted as above. 

Initialize Vm e V, the hst ChildrenSortedList„ = 0. 

For / ^ 1 to \V\ do 

Add Vj to the end of ChildrenSortedListparent(v,)- 

Computing /, and the intervals {/„l„gi/ is now done by a depth first search of T that respects the above 
order of the children. We next compute 1^,0) • ? £ 1 1, ... , [n/m\], j e 1 1, . . . , \n/{im)'\] using the following 
algorithm: 



SUBTREE-COUNT(m) 

Let vi, . . . , Vr be the children of u with |/,,| | > I/,,,! > • • • > lA-J- 
For / <— LI^I/'wJ down to [|/v, \/m\ + 1 do 
For j ^ [AJ{im)i to \Bu/iim)'] do 
Add u to Fiij) 
For /j <— 1 to r - 1 do 

For / <— [|/vj/wj down to [|/i7,+,|/mj + 1 do 
For j <- \BvJ{im)'\ + 1 to rB„/(/m)l do 
Add u to FiU) 
For /j ^ 1 to r do call SUBTREE-COUNT(v/,). 

Here is an informal explanation of the correctness of this algorithm. The only relevant sets which will 
contain the vertex u e V are those in the range / € [L|/vJ/"^J + 1> Ll^l/wiJ]- Above this range /„ does not meet 
the size constraint, and below this range any Fi(j) which intersects must also intersect one of the children 
of u, which also satisfies the size constraint, in which case one of the descendants of u will be in In 
the aforementioned range, we add u to only for j such that the interval [{j - \)im + l,jim] does not 
intersect one of the children of m in a set of size larger than im. Here we use the fact that the intervals of the 
children are sorted in non-increasing order according to their size. Regarding running time, this reasoning 
implies that each vertex of T, and each entry in is accessed by this algorithm only a constant number 
of times, and each access involves only constant number of computation steps. So the running time is 

We conclude with the query procedure. Given a query x e X and f e N, access F( (["^j) in 0(1) time. 

Next, for each v e F( (["^j), check whether lca(x, v) is the required vertex (we are thus using here also the 
data structure for computing the lea of |21 6|. Observe also that since |F,(j)| < 2, we only have a constant 
number of checks to do). By Claim lAT^ this will yield the required result. □ 

By setting m = 1 in Lemma IXTI we obtain a data structure for the Size-Ancestor problem with 0(1) 
query time, but 0{n\ogn) preprocessing time. To improve upon this, we set m = ©(logn) in Lemma IaTTI 
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and deal with the resulting gaps by enumerating all the possible ways in which the remaining m - 1 leaves 
can be added to the tree. Exact details are given below. 



Proof of Lemma W^ Fix m = [{log ?i)/4J. Each subset A c {0, . . . , m - 1) is represented as a number #A e 
{0, . . . , 2™ - 1) by #A = YjieA 2'- We next construct in memory a vector enum of size 2'", where enum[#A] 
is a vector of size m, with integer index in the range { 1, . . . , m), such that enum[#A][/] = |A n {0, . . . , / - 1}|. 
Clearly enum can be constructed in 0{2'"m) = o{n) time. 
For each vertex u we compute and store: 

• depth(M) which is the edge's distance from the root to u. 

• €t{u), the number of of leaves in the subtree rooted at u. 

• The number #A„, where 



We also apply the level ancestor data-structure, that after 0{n) preprocessing time, answers in constant time 
queries of the form: Given a vertex u and an integer d, find an ancestor of u at depth d (if it exists) (such a 
data structure is constructed in |7|). Lastly, we use the data structure from Lemma lATTl 

With all this machinary in place, a query for the least ancestor of a leaf x having at least £ leaves 
is answered in constant time as follows. First compute q = [l/m]. Apply a query to the data structure 
of Lemma IKH with x and q, and obtain u, the least ancestor of x such that iriu) ^ Q^n. If £t(u) ^ ^ 
then u is the least ancestor with { leaves, so the data-structure returns u. Otherwise, iriu) < ^, and let 
a = enum[#Au][^ - friu)]. Note that depth(M) - a is the depth of the least ancestor of u having at least £ 
leaves, thus the query uses the level ancestor data-structure to return this ancestor. Clearly the whole query 
takes a constant time. 

It remains to argue that the data structure can be preprocessed in linear time. We already argued about 
most parts of the data structure, and iriu) and depth(M) are easy to compute in linear time. Thus we are 
left with computing #A„ for each vertex u. This is done using a top-down scan of the tree (e.g., depth first 
search). The root is assigned with 1. Each non-root vertex u, whose parent is v, is assigned 



It is clear that this indeed computes #A„. The relevant exponents are computed in advance and stored in a 



Remark A.l. This data structure can be modified in a straightforward way to answer queries to the least 
ancestor of a given size (in terms of the number of vertices in its subtree). It is also easy to extend it to 
queries which are non-leaf vertices. 

B The metric Ramsey theorem implies the existence of Ramsey partitions 

In this appendix we complete the discussion in Section|2by showing that the metric Ramsey theorem implies 
the existence of good Ramsey partitions. The results here are not otherwise used in this paper. 

Proposition B.l. Fix a > 1 and ij/ € (0, 1), and assume that every n-point metric space has a subset of size 
n^ which is a-equivalent to an ultrametric. Then every n-point metric space {X, dx) admits a distribution 



Au ^[ke {0, 



, m - 1} : u has an ancestor with exactly ^^(m) -i- k descendant leaves 




lookup table. 



□ 



17 



over partition trees {^k}t=o such that for every x e X, 

1-^ 



Pr 



y ken, Bxlx, — diam(X) 1 c 
\ 960- / 



(x) 



> 



Proof. Let (X, dx) be an n-point metric space. The argument starts out similarly to the proof of Lemma HT^ 
Using the assumptions and Lemma HTTI iterativelv. we find a decreasing chain of subsets X = Xq ^ Xi ^ 
X2 - • ■ ^ Xs = % and ultrametrics pi, . . . ,Ps on X, such that if we denote Yj = Xj^i \ Xj then \Yj\ > 
for x,y € X, pj{x,y) > dxix,y), and for x € X, 3^ € Yj we have pj{x,y) < 6adxix,y). As in the proof of 
Lemma l4r2l it follows by induction that s < ■ n^~'^ . 

By 121 Lemma 3.5] we may assume that the ultrametric pj can be represented by an exact 2-HST Tj = 
(Vj, Ej), with vertex labels A^^ , at the expense of replacing the factor 6 above by 12. Let Ay be the label of 
the root of Tj, and denote for € N, A^ = {v e Vj : A7-.(v) = 2*'' Ay). For every v € Vj let ^y(v) be the 

leaves of Tj which are descendants of v. Thus := [^jiv) : v e A^} is a 2~*^Ay bounded partition of X 
(boundedness is in the metric dx)- Fix x e Yj, k eN and let v be the unique ancestor of X in A*:. If z e X is 
such that dxix,z) < ■ 2~'^Ay then A^^ {lcaTjix,zfj = Pj{x,z) < 2~'^Ay. It follows that z is a descendant of 
V, so that z € ^J(x) = ^y(v). Thus ^J(x) 2 Bx (x, • 2"^ Ay). 

Passing to powers of 8 (i.e. choosing for each k the integer € such that 8"^^^^ diam(x) < 2~*^Ay < 
8~^ diam(X) and indexing the above partitions using { instead of k), we have thus shown that for every 
j € 1 1, . . . , 5) there is a partition tree such that for every x e Yj we have for all k. 



Bx{x, ^ • 8-^diam(X)J c ^/(x). 



-1* 



Since the sets Fi , . . . , cover X, and s < 'j^ , the required distribution over partition trees can be obtained 

f. ^ CO f \0O 

^k\k-o ' ■ ■ ■ ' i^kik-o ™iforrnly random. □ 

Remark B.l. Motivated by the re- weighting argument in [12 1, it is possible to improve the lower bound 
in Proposition IB. II for i// in a certain range. We shall now sketch this argument. It should be remarked, 
however, that there are several variants of this re-weighting procedure (like re-weighting again at each step), 
so it might be possible to slightly improve upon Proposition IB . 1 I for a larger range of i//. We did not attempt 
to optimize this argument here. 

Fix T] € (0, 1) to be determined in the ensuing argument, and let {X,dx) be an ?i-point metric space. 
Duplicate each point in X n'^ times, obtaining a (semi-) metric space X' with n'^'' points (this can be made 
into a metric by applying an arbitrarily small perturbation). We shall define inductively a decreasing chain 
of subsets X' = Xq 2 ^'1 2 ^2 2 ' ' ' as follows. For x € X, let /j,(x) be the number of copies of x in X'. 
(thus ho(x) = n^). Having defined Xj, let F,+i c Xj be a subset which is a-equivalent to an ultrametric and 
li'r+il > Wif- We then define X\^^ via 



hu\{x) = 



I \hi{x)l2\ there exists a copy of x in F,+i 
\hi{x) otherwise. 



Continue this procedure until we arrive at the empty set. Observe that 

IX' i<ix;i-^ix;i^<ix;ifi ^ 
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Thus \X'.\ < n^^^ ■ [\ - ^ It follows that this procedure terminates after oiri 



(\+if)(\->P) 



log n) 



steps, and by construction each point of X appears in 0(77 log «) of the subsets F,. As in the proof of 
Proposition IB.ll by selecting each of the uniformly at random we get a distribution over partition trees 
{£^k]'k=Q such that for every xeX, 



Pr 



y keM, Bxix,-^ diam(X) I c ^k{x) 
\ 96a / 



Optimizing over 77 € (0, 1), we see that as long as, \ - ijj > we can choose 77 = (i_^|iog„ , yielding the 
probabilistic estimate 



Pr 



M ke^, Bx(x,J— ■ 8"*^ diam(X) ) c ^^ix) 
\ 9oa I 



1 



1 



(1 - (A)log« n 



This estimate is better than Proposition IB . 1 1 when < \ - if/ < O 



/log" 
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